feat: add timeout param to client Dataset.encode and document num_proc#232
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a default timeout of 600 seconds for the encode method in the client generator, updating both Dataset and LazyDataset classes, and documents this change along with multi-process parallelism (num_proc) in the documentation. It also adds save_as and flush_save methods to the base dataset client. The reviewer feedback suggests improving the robustness of the timeout injection by allowing Optional[int] to disable timeouts, avoiding duplicate injection if the parameter is already present, and restricting the injection specifically to the Dataset and LazyDataset classes to prevent unexpected behavior in other classes with an encode method.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
PR type
PR information
为
twinkle_client.dataset.Dataset.encode新增timeout参数(默认 600 秒),透传至底层http_post,解决大数据集 / 多进程 tokenize 时 HTTP 请求超时的问题。client_tools/client_generator.py的build_method,仅对encode方法注入timeout: int = 600并转发给http_post,其余生成方法保持不变。src/twinkle_client/dataset/*.py等自动生成产物。tests/twinkle_client/test_client_timeout.py,以 TDD 方式验证encode的签名与http_post调用均携带timeout。num_proc加速 encode 的提示;在 Twinkle 客户端文档补充timeout用法示例。用法: